Building an old Occitan corpus via cross-Language transfer

نویسندگان

  • Olga Scrivner
  • Sandra Kübler
چکیده

This paper describes the implementation of a resource-light approach, cross-language transfer, to build and annotate a historical corpus for Old Occitan. Our approach transfers morpho-syntactic and syntactic annotation from resource-rich source languages, Old French and Catalan, to a genetically related target language, Old Occitan. The present corpus consists of three sub-corpora in XML format: 1) raw text; 2) part-of-speech tagged text; and 3) syntactically annotated text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tools for Digital Humanities: Enabling Access to the Old Occitan Romance of Flamenca

Accessing historical texts is often a challenge because readers either do not know the historical language, or they are challenged by the technological hurdle when such texts are available digitally. Merging corpus linguistic methods and digital technology can provide novel ways of representing historical texts digitally and providing a simpler access. In this paper, we describe a multi-dimensi...

متن کامل

A Lexicon for Old Occitan Medico-Botanical Terminology in Lemon

The article presents the adaptation of the lemon model (a model for lexica as RDF data) for a multilingual and multi-alphabetical lexicon of Old Occitan medico-botanical terminology. The lexicon is the core component of an ontology-based information system that will be constructed and implemented within the DFG-funded project "Dictionnaire de Termes Médico-botaniques de l’Ancien Occitan" (DiTMA...

متن کامل

BaTelÒc: A Text Base for the Occitan Language1

Language Documentation, as defined by Himmelmann (2006), aims at compiling and preserving linguistic data for studies in linguistics, literature, history, ethnology, sociology. This initiative is vital for endangered languages such as Occitan, a romance language spoken in southern France and in several valleys of Spain and Italy. The documentation of a language concerns all its modalities, cove...

متن کامل

O. Scrivner, T. Gilmanov SWIFT ALIGNER: A TOOL FOR THE VISUALIZATION AND CORRECTION OF WORD ALIGNMENT AND FOR CROSS LANGUAGE TRANSFER

It is well known that parallel corpora are valuable linguistic resources. One of the benefits of such corpora is that they allow for the building an annotated corpus for resource-poor languages via crosslanguage transfer. That is, given accurate alignment between a word from a source language and its equivalent in a target language, some linguistic information, such as part-of-speech tags or sy...

متن کامل

Cross-language transfer of semantic annotation via targeted crowdsourcing: task design and evaluation

The development of a natural language speech application requires the process of semantic annotation. Moreover multilingual porting of speech applications increases the cost and complexity of the annotation task. In this paper we address the problem of transferring the semantic annotation of the source language corpus to a low-resource target language via crowdsourcing. The current crowdsourcin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012